DEWS 2006 3 A - i 6 Finding Thai Web

نویسندگان

  • Kulwadee SOMBOONVIWAT
  • Takayuki TAMURA
  • Masaru KITSUREGAWA
چکیده

While the Web has been increasingly recognized as a culturally valuable social artifact, many nations endeavor to create national Web archives for long term preservation. However, due to its borderless-ness, gathering information for a specific nation from the Web is challenging. This paper proposes language specific web crawling (LSWC) as a method of creating Web archives for countries with linguistic identities such as Thailand. The LSWC strategy for selectively gathering Thai web pages from virtually anywhere on the Web is derived based on static analyses of the Thai Web graph. Then, the LSWC strategy is evaluated on a crawling simulator with large dataset. Keyword , , Web , Web , , , Web

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Blind Evaluation for Thai Search Engines

This paper compares the effectiveness of two different Thai search engines by using a blind evaluation. The probabilistic-based dictionary-less search engine is evaluated against the traditional word-based indexing method. The web documents from 12 Thai newspaper web sites consisting of 83,453 documents are used as the test collection. The relevance judgment is conducted on the first five retur...

متن کامل

A Collaborative Framework for Collecting Thai Unknown Words from the Web

We propose a collaborative framework for collecting Thai unknown words found on Web pages over the Internet. Our main goal is to design and construct a Webbased system which allows a group of interested users to participate in constructing a Thai unknown-word open dictionary. The proposed framework provides supporting algorithms and tools for automatically identifying and extracting unknown wor...

متن کامل

Classification of News Web Documents Based on Structural Features

The motivation of this work comes from the need of a Thai web corpus for testing our information retrieval algorithm. Two collections of news web documents are gathered from two different Thai newspaper web sites. Our goal is to find a simple yet effective method to extract news articles from these web collections. We explore the use of machine learning methods to distinguish article pages from...

متن کامل

Web Accessibility for Older Readers: Effects of Font Type and Font Size on Skim Reading Webpages in Thai

Most guidelines for making websites accessible for older people have been developed for the Latin alphabet. Currently, there are no web design guidelines for the Thai language or for Thai older people. Our research investigated the effect of font type and size in Thai on skim reading for Thai younger (21-39 years) and older (59-72 years) adults. There were two levels of font types (Conservative...

متن کامل

Serum concentrations of Krebs von den Lungen-6, surfactant protein D, and matrix metalloproteinase-2 as diagnostic biomarkers in patients with asbestosis and silicosis: a case–control study

BACKGROUND Asbestosis and silicosis are progressive pneumoconioses characterized by interstitial fibrosis following exposure to asbestos or silica dust. We evaluated the potential diagnostic biomarkers for these diseases. METHODS The serum concentrations of Krebs von den Lungen-6 (KL-6), surfactant protein D (SP-D), and matrix metalloproteinase-2 (MMP-2), MMP-7, and MMP-9 were measured in 43 ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006